156 research outputs found

    Evaluation of statistical methods, modeling, and multiple testing in RNA-seq studies

    Get PDF
    Recent Next Generation Sequencing methods provide a count of RNA molecules in the form of short reads, yielding discrete, often highly non-normally distributed gene expression measurements. Due to this feature of RNA sequencing (RNA-seq) data, appropriate statistical inference methods are required. Although Negative Binomial (NB) regression has been generally accepted in the analysis of RNA-seq data, its appropriateness in the application to genetic studies has not been exhaustively evaluated. Additionally, adjusting for covariates that have an unknown relationship with expression of a gene has not been extensively evaluated in RNA-seq studies using the NB framework. Finally, the dependent structures in RNA-Seq data may violate the assumptions of some multiple testing correction methods. In this dissertation, we suggest an alternative regression method, evaluate the effect of covariates, and compare various multiple testing correction methods. We conduct simulation studies and apply these methods to a real data set. First, we suggest Firth’s logistic regression for detecting differentially expressed genes in RNA-seq data. We also recommend the data adaptive method that estimates a recalibrated distribution of test statistics. Firth’ logistic regression exhibits an appropriately controlled Type-I error rate using the data adaptive method and shows comparable power to NB regression in simulation studies. Next, we evaluate the effect of disease-associated covariates where the relationship between the covariate and gene expression is unknown. Although the power of NB and Firth’s logistic regression is decreased as disease-associated covariates are added in a model, Type-I error rates are well controlled in Firth’ logistic regression if the relationship between a covariate and disease is not strong. Finally, we compare multiple testing correction methods that control family-wise error rates and impose false discovery rates. The evaluation reveals that an understanding of study designs, RNA-seq data, and the consequences of applying specific regression and multiple testing correction methods are very important factors to control family-wise error rates or false discovery rates. We believe our statistical investigations will enrich gene expression studies and influence related statistical methods

    Using linkage analysis of large pedigrees to guide association analyses

    Get PDF
    To date, genome-wide association studies have yielded discoveries of common variants that partly explain familial aggregation of diseases and traits. Researchers are now turning their attention to less common variants because the price of sequencing has dropped drastically. However, because sequencing of the whole genome in large samples is costly, great care must be taken to prioritize which samples and which genomic regions are selected for sequencing. We are interested in identifying genomic regions for deep sequencing using large multiplex families collected as part of earlier linkage studies. We incorporate linkage analysis into our search for Q1-associated alleles. Overall, we found that power was low for both whole-exome and linkage-guided sequencing analysis. By restricting sequencing to regions with high LOD peaks, we found fewer associated single-nucleotide polymorphisms than by using whole-exome sequencing. However, incorporating linkage analysis enabled us to detect more than half of the associated susceptibility loci (52%) that would have been identified by whole-exome sequencing while examining only 2.5% of the exome. This result suggests that incorporating linkage results from large multiplex families might greatly increase the efficiency of sequencing to detect trait-associated alleles in complex disease

    Growth mixture modeling as an exploratory analysis tool in longitudinal quantitative trait loci analysis

    Get PDF
    We examined the properties of growth mixture modeling in finding longitudinal quantitative trait loci in a genome-wide association study. Two software packages are commonly used in these analyses: Mplus and the SAS TRAJ procedure. We analyzed the 200 replicates of the simulated data with these programs using three tests: the likelihood-ratio test statistic, a direct test of genetic model coefficients, and the chi-square test classifying subjects based on the trajectory model's posterior Bayesian probability. The Mplus program was not effective in this application due to its computational demands. The distributions of these tests applied to genes not related to the trait were sensitive to departures from Hardy-Weinberg equilibrium. The likelihood-ratio test statistic was not usable in this application because its distribution was far from the expected asymptotic distributions when applied to markers with no genetic relation to the quantitative trait. The other two tests were satisfactory. Power was still substantial when we used markers near the gene rather than the gene itself. That is, growth mixture modeling may be useful in genome-wide association studies. For markers near the actual gene, there was somewhat greater power for the direct test of the coefficients and lesser power for the posterior Bayesian probability chi-square test

    Atrial fibrillation genetic risk differentiates cardioembolic stroke from other stroke subtypes

    Get PDF
    Objective We sought to assess whether genetic risk factors for atrial fibrillation (AF) can explain cardioembolic stroke risk. Methods We evaluated genetic correlations between a previous genetic study of AF and AF in the presence of cardioembolic stroke using genome-wide genotypes from the Stroke Genetics Network (N = 3,190 AF cases, 3,000 cardioembolic stroke cases, and 28,026 referents). We tested whether a previously validated AF polygenic risk score (PRS) associated with cardioembolic and other stroke subtypes after accounting for AF clinical risk factors. Results We observed a strong correlation between previously reported genetic risk for AF, AF in the presence of stroke, and cardioembolic stroke (Pearson r = 0.77 and 0.76, respectively, across SNPs with p 0.1). Conclusion: s Genetic risk of AF is associated with cardioembolic stroke, independent of clinical risk factors. Studies are warranted to determine whether AF genetic risk can serve as a biomarker for strokes caused by AF

    Associations of NINJ2 sequence variants with incident ischemic stroke in the Cohorts for Heart and Aging in Genomic Epidemiology (CHARGE) consortium

    Get PDF
    Background<p></p> Stroke, the leading neurologic cause of death and disability, has a substantial genetic component. We previously conducted a genome-wide association study (GWAS) in four prospective studies from the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium and demonstrated that sequence variants near the NINJ2 gene are associated with incident ischemic stroke. Here, we sought to fine-map functional variants in the region and evaluate the contribution of rare variants to ischemic stroke risk.<p></p> Methods and Results<p></p> We sequenced 196 kb around NINJ2 on chromosome 12p13 among 3,986 European ancestry participants, including 475 ischemic stroke cases, from the Atherosclerosis Risk in Communities Study, Cardiovascular Health Study, and Framingham Heart Study. Meta-analyses of single-variant tests for 425 common variants (minor allele frequency [MAF] ≥ 1%) confirmed the original GWAS results and identified an independent intronic variant, rs34166160 (MAF = 0.012), most significantly associated with incident ischemic stroke (HR = 1.80, p = 0.0003). Aggregating 278 putatively-functional variants with MAF≤ 1% using count statistics, we observed a nominally statistically significant association, with the burden of rare NINJ2 variants contributing to decreased ischemic stroke incidence (HR = 0.81; p = 0.026).<p></p> Conclusion<p></p> Common and rare variants in the NINJ2 region were nominally associated with incident ischemic stroke among a subset of CHARGE participants. Allelic heterogeneity at this locus, caused by multiple rare, low frequency, and common variants with disparate effects on risk, may explain the difficulties in replicating the original GWAS results. Additional studies that take into account the complex allelic architecture at this locus are needed to confirm these findings

    Gemcitabine-Based Neoadjuvant Treatment in Borderline Resectable Pancreatic Ductal Adenocarcinoma: A Meta-Analysis of Individual Patient Data

    Get PDF
    Background: Non-randomized studies have investigated multi-agent gemcitabinebased neo-adjuvant therapies (GEM-NAT) in borderline resectable pancreatic ductal adenocarcinoma (BR-PDAC). Treatment sequencing and specific elements of neoadjuvant treatment are still under investigation. The present meta-analysis aims to assess the effectiveness of GEM-NAT on overall survival (OS) in BR-PDAC. Patients and Methods: A meta-analysis of individual participant data (IPD) on GEMNAT for BR-PDAC were performed. The primary outcome was OS after treatment with GEM-based chemotherapy. In the Individual Patient Data analysis data were reappraised and confirmed as BR-PDAC on provided radiological data. Results: Six studies investigating GEM-NAT were included in the IPD metanalysis. The IPD metanalysis was conducted on 271 patients who received GEM-NAT. Pooled median patient-level OS was 22.2 months (95%CI 19.1–25.2). R0 rates ranged between 81 and 95% (I 2 = 0%, p = 0.64), respectively. Median OS was 27.8 months (95%CI 23.9–31.6) in the patients who received NAT-GEM followed by resection compared to 15.4 months (95%CI 12.3–18.4) for NAT-GEM without resection and 13.0 months (95%CI 7.4–18.5) in the group of patients who received upfront surgery (p < 0.0001). R0 rates ranged between 81 and 95% (I 2 = 0%, p = 0.64), respectively. Overall survival in the R0 group was 29.3 months (95% CI 24.3–34.2) vs. 16.2 months (95% CI 7·9–24.5) in the R1 group (p = 0·001). Conclusions: The present study is the first meta-analysis combining IPD from a number of international centers with BR-PDAC in a cohort that underwent multi-agent gemcitabine neoadjuvant therapy (GEM-NAT) before surgery. GEM-NAT followed by surgical resection improve sur

    Genetics of myocardial interstitial fibrosis in the human heart and association with disease

    Get PDF
    Myocardial interstitial fibrosis is associated with cardiovascular disease and adverse prognosis. Here, to investigate the biological pathways that underlie fibrosis in the human heart, we developed a machine learning model to measure native myocardial T1 time, a marker of myocardial fibrosis, in 41,505 UK Biobank participants who underwent cardiac magnetic resonance imaging. Greater T1 time was associated with diabetes mellitus, renal disease, aortic stenosis, cardiomyopathy, heart failure, atrial fibrillation, conduction disease and rheumatoid arthritis. Genome-wide association analysis identified 11 independent loci associated with T1 time. The identified loci implicated genes involved in glucose transport (SLC2A12), iron homeostasis (HFE, TMPRSS6), tissue repair (ADAMTSL1, VEGFC), oxidative stress (SOD2), cardiac hypertrophy (MYH7B) and calcium signaling (CAMK2D). Using a transforming growth factor β1-mediated cardiac fibroblast activation assay, we found that 9 of the 11 loci consisted of genes that exhibited temporal changes in expression or open chromatin conformation supporting their biological relevance to myofibroblast cell state acquisition. By harnessing machine learning to perform large-scale quantification of myocardial interstitial fibrosis using cardiac imaging, we validate associations between cardiac fibrosis and disease, and identify new biologically relevant pathways underlying fibrosis.</p

    Transcriptional and Cellular Diversity of the Human Heart

    Get PDF
    Background: The human heart requires a complex ensemble of specialized cell types to perform its essential function. A greater knowledge of the intricate cellular milieu of the heart is critical to increase our understanding of cardiac homeostasis and pathology. As recent advances in low-input RNA sequencing have allowed definitions of cellular transcriptomes at single-cell resolution at scale, we have applied these approaches to assess the cellular and transcriptional diversity of the nonfailing human heart. Methods: Microfluidic encapsulation and barcoding was used to perform single nuclear RNA sequencing with samples from 7 human donors, selected for their absence of overt cardiac disease. Individual nuclear transcriptomes were then clustered based on transcriptional profiles of highly variable genes. These clusters were used as the basis for between-chamber and between-sex differential gene expression analyses and intersection with genetic and pharmacologic data. Results: We sequenced the transcriptomes of 287 269 single cardiac nuclei, revealing 9 major cell types and 20 subclusters of cell types within the human heart. Cellular subclasses include 2 distinct groups of resident macrophages, 4 endothelial subtypes, and 2 fibroblast subsets. Comparisons of cellular transcriptomes by cardiac chamber or sex reveal diversity not only in cardiomyocyte transcriptional programs but also in subtypes involved in extracellular matrix remodeling and vascularization. Using genetic association data, we identified strong enrichment for the role of cell subtypes in cardiac traits and diseases. Intersection of our data set with genes on cardiac clinical testing panels and the druggable genome reveals striking patterns of cellular specificity. Conclusions: Using large-scale single nuclei RNA sequencing, we defined the transcriptional and cellular diversity in the normal human heart. Our identification of discrete cell subtypes and differentially expressed genes within the heart will ultimately facilitate the development of new therapeutics for cardiovascular diseases

    Atrial fibrillation genetic risk differentiates cardioembolic stroke from other stroke subtypes

    Get PDF
    Objective: We sought to assess whether genetic risk factors for atrial fibrillation (AF) can explain cardioembolic stroke risk. Methods: We evaluated genetic correlations between a previous genetic study of AF and AF in the presence of cardioembolic stroke using genome-wide genotypes from the Stroke Genetics Network (N = 3,190 AF cases, 3,000 cardioembolic stroke cases, and 28,026 referents). We tested whether a previously validated AF polygenic risk score (PRS) associated with cardioembolic and other stroke subtypes after accounting for AF clinical risk factors. Results: We observed a strong correlation between previously reported genetic risk for AF, AF in the presence of stroke, and cardioembolic stroke (Pearson r = 0.77 and 0.76, respectively, across SNPs with p &lt; 4.4 × 10−4 in the previous AF meta-analysis). An AF PRS, adjusted for clinical AF risk factors, was associated with cardioembolic stroke (odds ratio [OR] per SD = 1.40, p = 1.45 × 10−48), explaining ∼20% of the heritable component of cardioembolic stroke risk. The AF PRS was also associated with stroke of undetermined cause (OR per SD = 1.07, p = 0.004), but no other primary stroke subtypes (all p &gt; 0.1). Conclusions: Genetic risk of AF is associated with cardioembolic stroke, independent of clinical risk factors. Studies are warranted to determine whether AF genetic risk can serve as a biomarker for strokes caused by AF
    corecore